# Overview ---------------------------------------------------------------------

This README page details the replication archive for “Marketing Taxation? 
Experimental Evidence on Enforcement and Bargaining in Malawian Markets.”

The code in this replication archive replicates the analysis presented in the
main paper and the appendix. One main file runs all code. The code cleans data 
from 4 surveys carried out for this project, 2 collections (folders) of data 
from the implementing partner, and 9 hand-coded data sets from data provided by
the implementing partner. It also reproduces the 6 data-presenting tables in the
main paper and the 3 figures and 54 tables in the supplementary appendix. 

The R code was last run on a desktop computer with a AMD 5 5600X 6-core processor,
16 GB of RAM, and Windows 11 Education version 23H2. On a similar machine the 
replicator should expect the code to run for about 35 minutes (the code is 
neither memory nor processor demanding).

The unzipped contents of this replication archive will take up approximately 
270 MB of disk space.

# Replication Archive Contents ------------------------------------------------------------

The list below describes all files and folders in the replication archive, 
including documentation, code files, and data files. along with a brief 
desciption of each file. Note, some data and output folders contain so many 
files of similar content that the contents are described in general, not file by
file.

Folder names in this list are followed by a /; file names are wrapped in quotes
(e.g. "file_name.extension")

- data/: folder that contains all data (raw, cleaned, formatted) required to
replicate analysis
  - 00_instruments/: folder that contains the survey instruments for the
  tax collector and market vendor surveys at baseline and endline
    - "Market Scoping Data Collection Instrument.pdf": PDF version of paper 
    questionnaire used to collect market scoping data. 
    - "TaxCollectorBase.xlsx": Survey CTO instrument Excel export for the baseline
    tax collector survey
    - "TaxCollectorEnd.xlsx": Survey CTO instrument Excel export for the endline
    tax collector survey
    - "VendorBase.xlsx": Survey CTO instrument Excel export for the baseline 
    market vendor survey
    - "VendorEnd.xlsx": Survey CTO instrument Excel export for the endline market
    vendor survey
  - 0_codebook/: folder that contains codebook file for all raw data files
    - "codebooks.html": file that contains codebooks (variable-description pairs)
    for all raw data sets.
  - 1_raw/: folder that contains all non-processed (by the research team)
  data files
    - DAIDataExchange/: folder that contains the month-by-month implementation 
    tracking and market revenue for all treatment markets provided by the 
    implementing partner DAI International (DAI). Contains 14 folders (1 for each
    month of the intevention tracking period - November 2017 - December 2018) that
    each contain 8 Excel files (1 per treatment district) for a total of 120 files.
    - Mobile Money/: folder that contains information on the mobile money treatment
      - "Market Names and Numbers.csv": file to help identify market transactions
      in "Mobile Money Transactions Compiled.csv"
      - "Mobile Money Transactions Compiled.csv": data file that contains all 
      recorded Mobile Money transactions
    - "alt_names.csv": file with alternate names for sample markets
    - "block_ids.dta": file with information about block IDs for sample markets
    due to blocked treatment assignment
    - "incentives_check_data.xlsx": file that contains data on the 
    tax collector incentive component of the Top-Down treatment, hand-coded 
    partially by DAI and by the research team based on data provided by DAI
    - "market_month_data.csv": data on implementation in each market on a 
    month-by-month (over the intervention tracking period) basis, hand-coded from
    data provided by DAI and Innovations for Poverty Action (IPA), the survey and
    monitoring partner, hand-coded by research team based on data provided by 
    DAI and IPA.
    - "market_only_data.xlsx": intervention-period (i.e. non-time varying) data on 
    implementation in each market, hand-coded by research team based on data 
    provided by DAI and IPA. 
    - "market_scoping_data.csv": file that contains scoping (pre-intervention) 
    information on all markets considered for the study. Data collected by IPA.
    - "marketvendor_BASELINEFINAL_noID_clean_v5.dta": preliminarily cleaned (
    via scripts1_cleaning/00cleaning_baseline_prelim.do") version of baseline
    market vendor survey data
    - "marketvendor_BASELINEFINAL_noID_raw.dta": raw de-identified data file for
    baseline market vendor survey (both long and short versions; collected by 
    IPA, who also did some basic, non-research-purpose cleaning)
    - "othr_mkts_base.csv": lists other markets sold at by vendors at baseline. 
    Derived from information in "marketvendor_BASELINEFINAL_noID_raw.dta"
    - "othr_mkts_end.csv": lists other markets sold at by vendors at endline. 
    Derived from information in "tad_endline_market_long_nopii.dta" and 
    "tad_endline_market_short_nopii.dta"
    - "tad_endline_market_long_nopii.dta": raw de-identified data file for the 
    long version of the endline market vendor survey (collected by IPA, who also
    did some basic, non-research purpose cleaning)
    - "tad_endline_market_short_nopii.dta": raw de-identified data file for the 
    short version of the endline market vendor survey (collected by IPA, who also
    did some basic, non-research purpose cleaning)
    - "tad_endline_tax_collector_nopii.dta": raw de-identified data file for the
    endline tax collector survey (collected by IPA, who also did some basic, 
    non-research purpose cleaning) 
    - "taxcollector_BASELINEFINAL.dta": raw de-identified data file for the
    baseline tax collector survey (collected by IPA, who also did some basic, 
    non-research purpose cleaning) 
    - "treatment_groups_raw.csv": file that contains treatment group assignment
    for each sample market.
  - 2_clean/: folder that contains cleaned data files (produced by scripts in
  scripts/1_cleaning/)
    - "market_only_data.RData": cleaned version of "market_only_data.xlsx"
    - "mkt_scoping_full.RData": cleaned version of "market_scoping_data.csv"
    - "mkt_scoping_sample.RData": cleaned version of "market_scoping_data.csv" 
    with non-sample markets excluded
    - "mobile_money.RData": cleaned version of data in Mobile Money/
    - "month_data_merged.RData": combined and cleaned data from DAIDataExchange,
    "mobile_money.RData", and "incentives_data.xlsx"
    - "tax_collector_base.RData": cleaned version of baseline tax collector survey
    data
    - "tax_collector_end.RData": cleaned version of baseline tax collector survey
    data
    - "treatment_groups.csv": cleaned version of "treatment_groups_raw.csv"
    - "vendor_base.RData": fully cleaned version of baseline market vendor survey
    data
    - "vendor_end.RData": cleaned version of baseline market vendor survey data
  - 3_formatted/: folder that contains data created from cleaned data files
    - "market_lvl.RData": market-level outcome data, for market-level analyses
    - "market_lvl_diffs.RData": market-level differences (Endline - Baseline) for
    outcome measures
    - "market_month_additional.RData": contains several data objects that are
    versions of the data stored in month_data_merged.RData, to ease some of the
    analysis.
    - "spillovers.RData": contains a series of data objects necessary for the 
    Treatment Externalities and IPW spillover analyses.
- output/: folder that contains all table (.tex) and figure (.png) output produced
by analysis code (and by extension that appears in the paper and supplementary
appendix).
  - appendix/: folder that contains appendix output
    - "figureE1.png", "figureN1.png", "figureN2.png": image files for
    figures 
    - "tableD1.tex" through "table06.tex" (55 files): .tex files for all
    54 tables in the supplementary appendix (table N1 has two panels that are
    saved separately and were combined by the research team in the paper itself)
  - main/: folder that contains main analysis output
    - "main_analysis_models.RData": file that contains all R model objects from
    the main paper analysis. Required for some of the additional analysis in 
    the supplementary appendix
- scripts
  - 00_misc/: folder that contains setup scripts
    - "install_packages.R": R script that installs all packages required to execute
    the code in the replication archive. **Installs software!**
  - 0_functions/: folder that contains scripts that define functions that
  facilitate different aspects of the replication process
    - "functions_analysis.RData": R script that defines convenience functions that
    help with analysis
    - "functions_cleaning.RData": R script that defines convenience functions that
    help with general data cleaning
    - "functions_data_exchange.RData": R script that defines convenience functions
    that help with the cleaning of the Data Exchange material in 
    data/1_raw/DAIDataExchange/
  - 1_cleaning/: folder that contains scripts that clean raw data and store
  cleaned versions.
    - "00cleaning_baseline_prelim.do": Stata .do script that performs preliminary
    cleaning on the baseline market vendor survey. Creates 
    "data/1_raw/marketvendor_BASELINEFINAL_noID_clean_v5.dta"
    - "00cleaning_treatment_groups.R": R script that cleans the treatment group
    information. Creates "data/2_clean/treatment_groups.csv"
    - "01cleaning_baseline.R": R script that cleans the baseline market vendor
    survey data. Creates "data/2_clean/vendor_base.RData"
    - "01cleaning_baseline.R": R script that cleans the endline market vendor
    survey data. Creates "data/2_clean/vendor_end.RData"
    - "01cleaning_mobile_money.R": R script that cleans the mobile money
    data. Creates "data/2_clean/mobile_money.RData"
    - "01cleaning_scoping.R": R script that cleans the pre-intervention scoping
    data. Creates "data2_clean/mkt_scoping_full.RData" and 
    "data/2_clean/mkt_scoping_sample.RData"
    - "01cleaning_tax_col_baseline.R": R script that cleans the baseline tax
    collector survey data. Creates "data2_clean/tax_collector_base.RData"
    - "01cleaning_tax_col_endline.R": R script that cleans the endline tax
    collector survey data. Creates "data2_clean/tax_collector_end.RData"
    - "02cleaning_market_month.R": R script that cleans the market month data.
    Creates "data/2_clean/market_only_data.RData"; takes approximately 8 minutes
    to run
    - "03cleaning_market_only.RData": R script that cleans the market only data.
    Creates "data/2_clean/month_data_merged.RData"
  - 2_formatting/: folder that contains scripts that format cleaned data
    - "01formatting_market_month.R": R script that creates versions of the market
    month data for ease of analysis. Creates
    "data/3_formatted/market_month_additional.Rdata"
    - "01formatting_mkt_lvl.R": R script that aggregates outcome variables to
    market level for market-level analyses. Creates 
    "data/3_formatted/market_lvl.RData" and 
    "data/3_formatted/market_lvl_diffs.RData"
    - "02formatting_spillover.R": R script that creates 
    "data/3_formatted/spillovers.RData"; takes approximately 27 minutes to run
  - 3_analysis/: folder that contains analysis scripts
    - "analysis_appendix.R": R script that creates the output in output/appendix/,
    replicating all analysis presented in the supplementary appendix
    - "analysis_main.R": R script that creates the output in output/main/, 
    replicating all analysis shown in the paper 
- "readme.txt": text file containing description of replication archive contents
and instructions for replicating analytic results
- "replicate_all.R": R script that runs all R script files in the proper order
in order to replicate all analysis for the paper and supplementary appendix
- "replication_marketing_taxation.rproj": R project file. All file paths in
scripts are relative, so this project file **must** be open in RStudio for the
code to execute successfully. 

Existing Output Material -------------------------------------------------------

Upon downloading of the replication archive, the output/ folder will already 
contain .tex files for all tables and .png files for all figures in the paper
(Table 2 through Table 7 in output/main/) and the supplementary appendix (Figure
E1 through Figure N2 and Table D1 through Table O6). They can be recreated by 
running the two scripts scripts3_analysis/analysis_main.R and 
scripts3_analysis/analysis_appendix.R.

Replication from Raw Data (Except .Do File) ------------------------------------

To replicate the project in an automated way (except for the one .do file that
does some preliminary cleaning of the baseline market vendor survey) from the 
raw data files:

1. Download at least the following folders (and their contents) and files:
       - data/1_raw/
       - scripts/00_misc/
       - scripts/0_functions/
       - scripts/1_cleaning/
       - scripts/2_formatting/
       - scripts/3_analysis/
       - replicate_all.R
       - replication_marketing_taxation.Rproj
2. Open replication_marketing_taxation.Rproj in RStudio.
3. If necessary, install required packages by sourcing 
scripts/00_misc/install_packages.R. This will install packages on your computer.
4. Run the code in replicate_all.R.

.DO FILE EXPLANATION: When we initially started analytic work on this project, 
right after baseline data collection concluded (late 2017), we were working
in Stata. We very soon switched to R before any actual analysis was done (which
had to wait until the end of endline data collection in January 2019), but the 
cleaning of the baseline survey was started in Stata (and later finished in R).
We never ported the initial cleaning done in Stata to R, which is why this
1 .do file exists in otherwise entirely R-based project.

NOTE: You can expect the replication process to take about 35-40 minutes.
The two most time-demanding scripts are:
    - scripts/1_cleaning/02cleaning_market_month.R (takes about 8-10 minutes)
    - scipts/2_formatting/formatting_spillover.R (takes about 27-30 minutes)

NOTE: You can also download the whole archive and follow steps 2. and 3. All
cleaning and analysis will be rerun. The cleaning and analysis scripts will
overwrite the existing data and output objects.

Replication from Raw Data (Including .Do file)  --------------------------------

To replicate the project in an automated way from the raw data files, including
the 1 .do file that does some preliminary cleaning on the baseline market vendor
data:

1. Download at least the following folders (and their contents) and files:
       - data/1_raw/
       - scripts/00_misc/
       - scripts/0_functions/
       - scripts/1_cleaning/
       - scripts/2_formatting/
       - scripts/3_analysis/
       - replicate_all.R
       - replication_marketing_taxation.Rproj
2. Open scripts/1_cleaning/00_cleaning_baseline_prelim.do in Stata. 
  - Change the file paths to match your computer's structure in lines 27 and
  1164.
  - Run the .do file.
  - Make sure that output file marketvendor_BASELINEFINAL_noID_clean_v5.dta is
  has been saved and replaced the existing version in data/1_raw/.
3. Open replication_marketing_taxation.Rproj in RStudio.
4. If necessary, install required packages by sourcing 
scripts/00_misc/install_packages.R. This will install packages on your computer.
5. Run the code in replicate_all.R.

Replication of Main Analysis Only ----------------------------------------------

In order to replicate only the results in the paper:

1. Download and unzip, if necessary, the entire replication archive.
2. Open replication_marketing_taxation.Rproj in RStudio.
3. If necessary, install required packages by sourcing 
scripts/00_misc/install_packages.R. This will install packages on your computer.
4. Source scripts/3_analysis/analysis_main.R.

Replication of Appendix Analysis Only ------------------------------------------

In order to replicate only the results in the supplementary appendix:

1. Download and unzip, if necessary, the entire replication archive.
2. Open replication_marketing_taxation.Rproj in RStudio.
3. If necessary, install required packages by sourcing 
scripts/00_misc/install_packages.R. This will install packages on your computer.
4. Source scripts/3_analysis/analysis_appendix.R.

Replication in Part ------------------------------------------------------------

If the whole archive is downloaded and unzipped, if necessary, all scripts are
immediately executable and any part of the cleaning or analysis can be 
replicated, out of sequence if desired, **as long as the code is run within the 
replication_marketing_taxation R project.**

Within the scripts/1_cleaning/ and scripts/2_formatting/ folders, the script
files all have numeric prefixes (e.g., 01cleaning_endline.R, 
02cleaning_market_month.R). This is only relevant when replicating from scratch
as the some of the cleaning scripts must be executed in the prefix order.

Required Programs --------------------------------------------------------------

In order to replicate the results, R and RStudio must be installed.

R version at time of last execution of the code in this archive by research
team (parts of code may not fully work on past or future versions of R): 4.4.1

RStudio version at time of last execution of the code in this archive by research
team (parts of code may not fully work on past or future versions of RStudio): 
2024.12.1+563 

In order to run the .do file, a version of Stata must be installed.

Stata version at the time of the last execution of the .do file: Stata 14

Required R Packages ------------------------------------------------------------

The following R packages (and any of their dependencies) are required to 
successfully replicate the results. This list includes the version number of
the version at time of last execution of the code in this archive.

- {dplyr_1.1.4}
- {readr_2.1.5}
- {tidyr_1.3.1}
- {purrr_1.0.2}
- {tibble_3.2.1}
- {stringr_1.5.1}
- {ivreg_0.6-4}
- {broom_1.0.7}
- {forcats_1.0.0}
- {ggplot2_3.5.1}
- {car_3.1-3}
- {xtable_1.8-4}
- {stargazer_5.2.3}
- {multiwayvcov_1.2.3}
- {haven_2.5.4}
- {labelled_2.13.0}
- {lme4_1.1-35.5}
- {patchwork_1.3.0}
- {readxl_1.4.3}
- {hms_1.1.3}
- {sp_2.1-4}

They can be installed by running the code in scripts00_misc/install_packages.R

# Seed Locations ---------------------------------------------------------------

A seed is set in line 102 of scripts/2_formatting/02formatting_spillover.R to
ensure replicablity of the simulation-based inverse probability weights in
the spillover analysis.

A seed is set in line 2315 of scripts/3_analysis/analysis_appendix.R to ensure
consistent and reproducible jitter in Figure N2.